{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Invoking an ML API\n",
    "\n",
    "This notebook demonstrates how to invoke a deployed ML model (in this case, the Google Cloud Natural Language API)\n",
    "from a batch or streaming pipeline\n",
    "\n",
    "We will use Apache Beam."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install Beam\n",
    "\n",
    "Restart the kernel after installing Beam"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "%pip install --upgrade --quiet apache-beam[gcp]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Try out Beam"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -rf output.txt* beam-temp*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'DONE'"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import apache_beam as beam\n",
    "from apache_beam.ml.gcp import naturallanguageml as nlp\n",
    "\n",
    "def parse_nlp_result(response):\n",
    "    \"\"\"\n",
    "Pulls required info from a response that looks like this:\n",
    "\n",
    "sentences {\n",
    "  text {\n",
    "    content: \"I love walking along the Seine.\"\n",
    "  }\n",
    "  sentiment {\n",
    "    magnitude: 0.699999988079071\n",
    "    score: 0.699999988079071\n",
    "  }\n",
    "}\n",
    "entities {\n",
    "  name: \"Seine\"\n",
    "  type: LOCATION\n",
    "  metadata {\n",
    "    key: \"mid\"\n",
    "    value: \"/m/0f3vz\"\n",
    "  }\n",
    "  metadata {\n",
    "    key: \"wikipedia_url\"\n",
    "    value: \"https://en.wikipedia.org/wiki/Seine\"\n",
    "  }\n",
    "  salience: 1.0\n",
    "  mentions {\n",
    "    text {\n",
    "      content: \"Seine\"\n",
    "      begin_offset: 25\n",
    "    }\n",
    "    type: PROPER\n",
    "  }\n",
    "}\n",
    "document_sentiment {\n",
    "  magnitude: 0.699999988079071\n",
    "  score: 0.699999988079071\n",
    "}\n",
    "language: \"en\"\n",
    "    \"\"\"\n",
    "    def get_entity_value(entities, search_key):\n",
    "        for entity in entities:\n",
    "            return (entity.metadata[search_key])\n",
    "        return ''\n",
    "    \n",
    "    return [\n",
    "        # response, # entire string\n",
    "        response.sentences[0].text.content, # first sentence\n",
    "        [entity.name for entity in response.entities], # all entities\n",
    "        [entity.metadata['wikipedia_url'] for entity in response.entities], # urls\n",
    "        response.language,\n",
    "        response.document_sentiment.score\n",
    "    ]\n",
    "\n",
    "\n",
    "features = nlp.types.AnnotateTextRequest.Features(\n",
    "    extract_entities=True,\n",
    "    extract_document_sentiment=True,\n",
    "    extract_syntax=False\n",
    ")\n",
    "\n",
    "p = beam.Pipeline()\n",
    "(p \n",
    " | beam.Create(['Has President Obama been to Paris?', 'Sophie loves walking along the Seine.', \"C'est terrible\"])\n",
    " | beam.Map(lambda x : nlp.Document(x, type='PLAIN_TEXT'))\n",
    " | nlp.AnnotateText(features)\n",
    " | beam.Map(parse_nlp_result)\n",
    " | beam.io.WriteToText('output.txt')\n",
    ")\n",
    "result = p.run()\n",
    "result.wait_until_finish()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Has President Obama been to Paris?', ['Obama', 'Paris'], ['https://en.wikipedia.org/wiki/Barack_Obama', 'https://en.wikipedia.org/wiki/Paris'], 'en', 0.0]\n",
      "[\"C'est terrible\", [], [], 'fr', -0.8999999761581421]\n",
      "['Sophie loves walking along the Seine.', ['Sophie', 'Seine'], ['', 'https://en.wikipedia.org/wiki/Seine'], 'en', 0.800000011920929]\n"
     ]
    }
   ],
   "source": [
    "!cat output.txt*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Changing input to BigQuery and running on Cloud\n",
    "\n",
    "Use DataflowRunner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>I think there's a major problem with this, and...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Speaking of Rails, there are other options in ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>I don't see the point in this as a serious pro...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Nope. It is a nice package, but there's too ma...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>I'll be perfectly honest: what is popular on D...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Also keep in mind that Python has much better ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>I'm just wondering, what specifically about th...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Just get a basic knowledge of each of them.&lt;p&gt;...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Haven't we already discussed this? Google is n...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>The general idea is interesting and possibly e...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                text\n",
       "0  I think there's a major problem with this, and...\n",
       "1  Speaking of Rails, there are other options in ...\n",
       "2  I don't see the point in this as a serious pro...\n",
       "3  Nope. It is a nice package, but there's too ma...\n",
       "4  I'll be perfectly honest: what is popular on D...\n",
       "5  Also keep in mind that Python has much better ...\n",
       "6  I'm just wondering, what specifically about th...\n",
       "7  Just get a basic knowledge of each of them.<p>...\n",
       "8  Haven't we already discussed this? Google is n...\n",
       "9  The general idea is interesting and possibly e..."
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "SELECT text FROM `bigquery-public-data.hacker_news.comments`\n",
    "WHERE author = 'AF' LIMIT 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting nlp_pipeline.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile nlp_pipeline.py\n",
    "\n",
    "PROJECT='ai-analytics-solutions'\n",
    "BUCKET='ai-analytics-solutions-kfpdemo'\n",
    "REGION='us-central1'\n",
    "\n",
    "from datetime import datetime\n",
    "import apache_beam as beam\n",
    "\n",
    "def parse_nlp_result(response):\n",
    "    return [\n",
    "        # response, # entire string\n",
    "        response.sentences[0].text.content,\n",
    "        response.language,\n",
    "        response.document_sentiment.score\n",
    "    ]\n",
    "\n",
    "def run():\n",
    "    from apache_beam.ml.gcp import naturallanguageml as nlp\n",
    "    \n",
    "    features = nlp.types.AnnotateTextRequest.Features(\n",
    "        extract_entities=True,\n",
    "        extract_document_sentiment=True,\n",
    "        extract_syntax=False\n",
    "    )\n",
    "    options = beam.options.pipeline_options.PipelineOptions()\n",
    "    google_cloud_options = options.view_as(beam.options.pipeline_options.GoogleCloudOptions)\n",
    "    google_cloud_options.project = PROJECT\n",
    "    google_cloud_options.region = REGION\n",
    "    google_cloud_options.job_name = 'nlpapi-{}'.format(datetime.now().strftime(\"%Y%m%d-%H%M%S\"))\n",
    "    google_cloud_options.staging_location = 'gs://{}/staging'.format(BUCKET)\n",
    "    google_cloud_options.temp_location = 'gs://{}/temp'.format(BUCKET)\n",
    "    options.view_as(beam.options.pipeline_options.StandardOptions).runner = 'DataflowRunner' # 'DirectRunner'\n",
    "\n",
    "    p = beam.Pipeline(options=options)\n",
    "    (p \n",
    "     | 'bigquery' >> beam.io.Read(beam.io.BigQuerySource(\n",
    "         query=\"SELECT text FROM `bigquery-public-data.hacker_news.comments` WHERE author = 'AF' AND LENGTH(text) > 10\",\n",
    "         use_standard_sql=True))\n",
    "      | 'txt'      >> beam.Map(lambda x : x['text'])\n",
    "      | 'doc'      >> beam.Map(lambda x : nlp.Document(x, type='PLAIN_TEXT'))\n",
    "    #  | 'todict'   >> beam.Map(lambda x : nlp.Document.to_dict(x))\n",
    "      | 'nlp'      >> nlp.AnnotateText(features, timeout=10)\n",
    "      | 'parse'    >> beam.Map(parse_nlp_result)\n",
    "      | 'gcs'      >> beam.io.WriteToText('gs://{}/output.txt'.format(BUCKET), num_shards=1)\n",
    "    )\n",
    "    result = p.run()\n",
    "    result.wait_until_finish()\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python3 nlp_pipeline.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[\"I think there's a major problem with this, and it is that discussions come in all shapes and sizes across the web.<p>Think about it.\", 'en', -0.20000000298023224]\n",
      "['It is just a joke that Facebook could be valued at $6 billion.', 'en', -0.5]\n",
      "[\"This article doesn't make too much sense to me.<p>First of all, as Sam mentioned, the companies aren't that much different in size.\", 'en', -0.4000000059604645]\n",
      "['&#62; If only a real Lisp would come along that had the leadership, library, community and documentation of Python.<p>Someone was asking for that just the other day.', 'en', 0.0]\n",
      "[\"If the original poster has his service out when he does this, it could end up being both very bad PR for her and very good PR for him.<p>In fact, her ripping off his idea could end up being better publicity than he would've gotten anywhere else.\", 'en', -0.699999988079071]\n",
      "[\"I've had my share of bad experiences with Apple products/computers.\", 'en', 0.0]\n",
      "['Of course.', 'en', 0.0]\n",
      "[\"I don't think so.\", 'en', -0.10000000149011612]\n",
      "['This is just a single example.', 'en', 0.0]\n",
      "[\"The general idea is interesting and possibly even right, but bloggers tend to overvalue and exaggerate blogging's reach.\", 'en', 0.4000000059604645]\n",
      "[\"You know I really do like apt, but it isn't perfect, and 'rough around the edges' isn't a bad description for it (I know he meant Ubuntu as a whole).<p>I've had multiple times that by using apt-get autoremove, my entire Ubuntu installation has been fubared.\", 'en', 0.0]\n",
      "[\"I don't mean this to come out as negatively as it might sound, but JRuby being faster than Ruby 1.8 isn't saying much; MRI is pretty slow.\", 'en', -0.699999988079071]\n",
      "['\"Yes, but what you forget Paul backspace backspace backspace, Paul, is that Microsoft Vista has voice recognition.', 'en', 0.0]\n",
      "[\"It is definitely something to consider, and while I don't think translating Python to Common Lisp would be all that difficult, I do think that getting the C libraries that Python wraps working with CL might be a significant effort.\", 'en', 0.4000000059604645]\n",
      "[\"You are right, it would've been a poor argument if that's all they said.\", 'en', -0.10000000149011612]\n",
      "['Seems pretty silly.', 'en', -0.5]\n",
      "[\"I think the idea behind a 'web OS' is horrible.\", 'en', -0.20000000298023224]\n",
      "[\"I'm not a huge fan of it, but my immediate thought was 'emacs'.\", 'en', -0.10000000149011612]\n",
      "['The article is correct.', 'en', 0.0]\n",
      "['There are advantages to both approaches.', 'en', -0.10000000149011612]\n",
      "['Speaking of which, have you guys seen what Facebook has turned into with the advent of the platform?', 'en', -0.30000001192092896]\n",
      "[\"Don't be so sure either of those are that important.<p>You can get a laptop with comparable power to some of the more expensive desktops now for about the same price.\", 'en', -0.5]\n",
      "['Is he serious about this being the coolest application to date?', 'en', 0.0]\n",
      "['You realize what happened after the boom of the 1920s, right?', 'en', -0.30000001192092896]\n",
      "['I thought I made it clear.', 'en', 0.30000001192092896]\n",
      "[\"So basically Digg does what everyone else does?<p>I've always wondered if Digg was written using Python or another language instead of PHP, if they could squeeze better performance out of it.\", 'en', -0.6000000238418579]\n",
      "['&#62; It is increasingly dangerous to have a child past 30.', 'en', -0.30000001192092896]\n",
      "[\"Well, if you wanted to ship a desktop application or such, you'd have to open source your code.\", 'en', -0.4000000059604645]\n",
      "[\"Kind of an old post, things have changed for both Rails/Django since then.<p>Though I'd like to comment on part of his article.\", 'en', 0.0]\n",
      "['Smalltalk is a really excellent language.', 'en', 0.10000000149011612]\n",
      "['It is the internet.', 'en', 0.20000000298023224]\n",
      "['That clip struck me as Kevin being really shady.', 'en', -0.6000000238418579]\n",
      "['Is it?', 'en', 0.10000000149011612]\n",
      "['Please be smart and go with the ThinkPad (parcticularly a T61p).<p>Right now they have a 25% off sale on them, not to mention you can get another 5% off if you have a Visa card, or are a student, or a number of other things.', 'en', 0.0]\n",
      "['Speaking of Rails, there are other options in the Python world besides Django.<p>Pylons is a very Rails-y framework with the difference being that it is made to be easy to customize.', 'en', 0.0]\n",
      "[\"I don't see a problem with it.\", 'en', 0.10000000149011612]\n",
      "['What is wrong with just voting up stories you like?', 'en', -0.4000000059604645]\n",
      "[\"You are right about GEdit's power.\", 'en', 0.5]\n",
      "[\"I just wanted to say that if you ever plan on running Linux on your machine, don't get a Mac.<p>Ubuntu in particular works, but there's a lot of hacking to get it to do so and little issues abound (at least that's my experience).\", 'en', -0.20000000298023224]\n",
      "['Money is overrated.', 'en', -0.5]\n",
      "['Nope.', 'en', 0.20000000298023224]\n",
      "[\"Ubuntu really isn't difficult to setup at all on most PCs.\", 'en', 0.5]\n",
      "['What is the best alternative to an RDMS in most cases?', 'en', 0.800000011920929]\n",
      "[\"I'll be perfectly honest: what is popular on Digg is in no way something to be proud of.<p>Digg is a complete waste of time.\", 'en', -0.6000000238418579]\n",
      "['What kind of apps do you mean by this?', 'en', -0.30000001192092896]\n",
      "[\"Just get a basic knowledge of each of them.<p>CL is pretty special and has features you just won't find in most other languages (macros, interactive error-handling, generic functions), and raw speed (SBCL).<p>OCaml...I hear discussion about it, but personally having evaluated it, I don't know what kind of a future it has (I really doubt it will ever be a 'big' thing).\", 'en', -0.10000000149011612]\n",
      "['Hmm.', 'en', 0.0]\n",
      "[\"Please tell me what is wrong with my statement.<p>I don't think it of respectful users, I think it of Digg users, which have never been shown to be intelligent or respectful of either the web site they use or anyone else for that matter.\", 'en', -0.699999988079071]\n",
      "[\"Heh, well if you are building YouTube, Dreamhost isn't going to do much good either. :)\", 'en', 0.699999988079071]\n",
      "['How are people monetizing Facebook aps?', 'en', -0.30000001192092896]\n",
      "['One problem with this: people expect to do that in video games.', 'en', 0.10000000149011612]\n",
      "[\"I don't see the equivalence.\", 'en', -0.20000000298023224]\n",
      "['Sorry...double post.', 'en', -0.10000000149011612]\n",
      "[\"I hold the same disdain for walled networks you do, but ultimately I use Facebook and others do as well simply because it is convenient and easy to use.<p>I also wouldn't base my main application on it, but as a developer I don't mind it for smaller apps.\", 'en', -0.4000000059604645]\n",
      "[\"I've thought of this before, but the issue is, what if people don't pay up?\", 'en', -0.10000000149011612]\n",
      "[\"Yeah he definitely didn't start Reddit.\", 'en', -0.30000001192092896]\n",
      "[\"I'm just wondering, what specifically about the Mac UI gives you significant productivity boosts?\", 'en', 0.699999988079071]\n",
      "['The issue is EFI.', 'en', -0.10000000149011612]\n",
      "[\"Haven't we already discussed this?\", 'en', -0.20000000298023224]\n",
      "['When did he ever imply that?', 'en', -0.30000001192092896]\n",
      "['Wow.', 'en', -0.10000000149011612]\n",
      "['I am reminded of similar hyping and grandiose claims that Aaron made when he announced both Infogami and web.py.', 'en', -0.5]\n",
      "[\"I don't see the point in this as a serious production language.\", 'en', 0.0]\n",
      "['Can someone explain to me why all these startup companies end up in California?', 'en', 0.20000000298023224]\n",
      "['The GPL licensing would be an issue for some.<p>Edit: Unfortunately Logix seems to have been dead since 2005.<p><a href=\"http://common-lisp.net/project/python-on-lisp/\" rel=\"nofollow\">http://common-lisp.net/project/python-on-lisp/</a>', 'en', 0.0]\n",
      "[\"I'm sure you know that the idea of using code in templates is controversial anyway.\", 'en', 0.5]\n",
      "[\"Unless you've really got a Ruby fetish, there's just not much of a reason to use it.<p>If you want a similar language with similar productivity, Python gives you that and speed, Unicode support at the language level, and lots of libraries.<p>And in web frameworks alone you've got Django, Pylons, Turbogears, web.py, etc.\", 'en', 0.0]\n",
      "[\"You can directly compare the speed of an application or a language.<p>It is not subjective to say 'C is faster than Perl' or 'Perl is faster than Ruby'.\", 'en', 0.0]\n",
      "['Paul just how much impact do you think that Reddit being initially implemented in Lisp had?<p>I mean, sure, relative numbers of Lisp users might have been pretty small, but the very fact that it was written in Lisp may have gotten fans and detractors to at least hear about the site (probably the hardest thing to do), and then once they checked it out start using it.<p>So how important do you think that was to initial popularity?', 'en', -0.20000000298023224]\n",
      "['How many books do you need for a web framework?<p>Django is pretty simple.', 'en', -0.20000000298023224]\n",
      "[\"That isn't always an option (truth be told it isn't an option many people have).\", 'en', -0.10000000149011612]\n",
      "[\"I've recognized this 'mental transaction cost' for awhile, but never been able to put it into words.\", 'en', 0.20000000298023224]\n",
      "[\"I don't mean to be rude about it, but groopvine.com is a terrible domain name, imo.\", 'en', -0.4000000059604645]\n",
      "['People continue to bring this post up.', 'en', -0.10000000149011612]\n",
      "[\"I'm fairly certain it isn't revolutionary, it just shows the rampant immaturity of Digg's user base and their mob justice that has no respect and knows no bounds.\", 'en', -0.6000000238418579]\n",
      "[\"I've used both.\", 'en', 0.0]\n",
      "['Never.', 'en', -0.4000000059604645]\n",
      "[\"Google isn't an ad-supported business, though.\", 'en', -0.20000000298023224]\n",
      "['Hey guys...<p>I own an Apple and have been trying to get Ubuntu running on it.', 'en', -0.10000000149011612]\n",
      "['Those are called blogs, no?', 'en', -0.10000000149011612]\n",
      "['You are right about the runtimes.', 'en', -0.10000000149011612]\n",
      "['Hey gwen...what do you mean by this: \"If JS is Lisp in disguise\"?', 'en', 0.0]\n",
      "[\"As I mentioned in another post, I'm aware of the qualities of SBCL.\", 'en', -0.20000000298023224]\n",
      "['Also keep in mind that Python has much better (quantity and quality) libraries.<p>Compare, for example, SQLAlchemy to ActiveRecord.', 'en', -0.20000000298023224]\n",
      "['This is why if you have an idea you might as well do it.', 'en', -0.4000000059604645]\n",
      "['It is unfortunate because it seems that those that succeed (especially in the computer industry) are the biggest jerks.<p>The article does mention Gates, who was involved in some pretty shady stuff when it comes to establishing Windows.', 'en', -0.4000000059604645]\n",
      "['Because projectors require a few things that TVs do not.<p>1.', 'en', -0.4000000059604645]\n"
     ]
    }
   ],
   "source": [
    "!gsutil cat gs://$BUCKET/output.txt*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
